A Multiclassifier based Document Categorization System: profiting from the Singular Value Decomposition Dimensionality Reduction Technique

نویسندگان

  • Ana Zelaia Jauregi
  • Iñaki Alegria
  • Olatz Arregi Uriarte
  • Basilio Sierra
چکیده

In this paper we present a multiclassifier approach for multilabel document classification problems, where a set of k-NN classifiers is used to predict the category of text documents based on different training subsampling databases. These databases are obtained from the original training database by random subsampling. In order to combine the predictions generated by the multiclassifier, Bayesian voting is applied. Through all the classification process, a reduced dimension vector representation obtained by Singular Value Decomposition (SVD) is used for training and testing documents. The good results of our experiments give an indication of the potentiality of the proposed approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dimensionality Reduction Aids Term Co-Occurrence Based Multi-Document Summarization

A key task in an extraction system for query-oriented multi-document summarisation, necessary for computing relevance and redundancy, is modelling text semantics. In the Embra system, we use a representation derived from the singular value decomposition of a term co-occurrence matrix. We present methods to show the reliability of performance improvements. We find that Embra performs better with...

متن کامل

A multiclass/multilabel document categorization system: Combining multiple classifiers in a reduced dimension

This article presents a multiclassifier approach for multiclass/multilabel document categorization problems. For the categorization process, we use a reduced vector representation obtained by SVD for training and testing documents, and a set of k-NN classifiers to predict the category of test documents; each k-NN classifier uses a reduced database subsampled from the original training database....

متن کامل

Exploring Basque Document Categorization for Educational Purposes using LSI

In the process of preparing learning material for Computer Supported Learning Systems (CSLSs), one of the first steps involves finding documents relevant to the topics and to the students. This requires documents to be categorized according to some criteria. In this paper we analyze the behaviour of classification techniques such as Naïve Bayes, Winnow, SVMs and k-NN, together with lemmatizatio...

متن کامل

Document Clustering: Before and After the Singular Value Decomposition

Document Clustering is an issue of measuring similarity between documents and grouping similar documents together. Information Retrieval (IR) is an issue of comparing query with a collection of documents to locate a set of documents relevant to a particular query. In the vector space IR model, a query is treated as a document which consists of a few terms. Therefore, in both clustering and retr...

متن کامل

Singular Value Decomposition based Steganography Technique for JPEG2000 Compressed Images

In this paper, a steganography technique for JPEG2000 compressed images using singular value decomposition in wavelet transform domain is proposed. In this technique, DWT is applied on the cover image to get wavelet coefficients and SVD is applied on these wavelet coefficients to get the singular values. Then secret data is embedded into these singular values using scaling factor. Different com...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006